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Abstract 

In this paper we introduce the intuitive notion of trivergence of probability 
distributions (TPD). This notion allow us to calculate the similarity among 
triplets of objects. For this computation, we can use the well known measures 
of probability divergences like Kullback-Leibler and Jensen-Shannon. Diver¬ 
gence measures may be used in Information Retrieval tasks as Automatic Text 
Summarization, Text Classification, among many others. 
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1. Introduction 

A statistical distance defines a measure of distance between two objects. This 
measure of distance may be interpreted as a distance among two probability 
distributions of two populations. Moreover, a metric is a measure defined on 
a set A as a function d such as, \/x,y € A, d : A x A i-a TZ+. d respects the 
following conditions: 

i) d(x,y) > 0 

ii) d(x, y) = 0 iff x = y 
hi) d(x,y) = d(y,x) 

iv) d(x, z) < d(x, y) + d(y, z) 

Several measures of distance are not considered as metrics because they do 
not fulfill one or more of these conditions. These measures are known as diver¬ 
gences. This is the case of Kullback-Leibler divergence Dkl, that in particular, 
violates the conditions ii) and iii). In other hands, the Jensen-Shannon diver¬ 
gence -Djg is a metric. It corresponds to the symmetrical version of the Dkl 
divergence. 
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In this paper we introduce the notion of distance among three objects as a 
trivergence t of probability distribution. The main idea is based on intuitive 
properties of divergences. 

The rest of the paper is organized as follows: in Section £j2] we outline the di¬ 
vergences using probability distributions and smoothing. Section ^3] introduces 
the preliminaries of notion of trivergence. Sections ^4] and f|5] compute the 
trivergence as a product of divergences and as a compound divergence function. 
Finally Section ^6] shows the discussion and the conclusions. 

2. Preliminaries: divergences of probability distributions with smoo¬ 
thing 

In the follows, we recapitulate the divergence functions of probability distri¬ 
butions: the Kullback-Leibler divergence [T] and the Jensen-Shannon symmet¬ 
rical divergence [2]. 


2.1. Kullback-Leibler divergence 

The divergence of Kullback-Leibler or relative entropy is a distance between 
two probability distributions p and q is defined by the equation: 


D KL (p\\q) = 5>log 


w£p 


Pw•_ 

Qw 


(i) 


The logarithm is in base 2, but we adopted the notation convention log 2 as log. 

Of course, q w = 0 for a few items w, because not all items of p are in q. In 
this case, expressions like plog ^ —> oo may occur if q w = 0, i.e. when the item 
w q (see by example the Figure^. To avoid this situation, in an empirical way, 
a smoothing process is used for estimating the probability of unseen items. In 
the literature there are several smoothing techniques, for example Good-Turing, 
Back-Off, etc. EH- In this paper, we will use a very elementary smoothing: 


Qw — 


Cl 

W\ 


if w £ q 


—— elsewhere 
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( 2 ) 


c v 

where p and q are the probability distributions, p w = q w is defined by 
equation ([2]), C'j” is the number of ocurrences of the item w £ p, Cl is the 
number of ocurrences of the item w £ q, \p\ = total number of distinct items 
£ p , |g| = total number of distinct items £ q and |Tj = \p\ + |qj. In other hands, 
we assume that |p| > |q|, then the divergence is calculated from p to q. 

The Kullback-Leibler distance is not a metric in a mathematical sense, be¬ 
cause despite meeting that -Dkl(p||<z) > 0 with equality if and only if p = q, it 
is not symmetrical and it does not respect the triangle inequality. 
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2.2. Jensen-Shannon divergence 

The Jensen-Shannon divergence [2] or symmetrical distance of Kullback-Leibler 
between two probability distributions p and q over the same alphabet X is de¬ 
fined by the equation: 


Aisfalk) = 


1 

2 



+ ^2 q w log 

w£X 


2.qw 

Pw T qw 


( 3 ) 


with the same conventions forp, q , |p|, |q|, |T|, p w , q w , C? and C®, as in equation 
Q; and the same elementary smoothing ([2]). The logarithm is also in base 2, but 
we adopted the same convention for log 2 . y/Djs is a metric in a mathematical 
sense. 


3. Trivergence of probability distributions 

In order to define the trivergence between three probability distributions we 
will use divergence measures. Let p , q and r be three probability distributions 
and T = {pUgUr}, with cardinality \T\. Figure [l] shows the partitioning of the 
T set in 7 regions. 

We defined two ways to calculate the trivergence r, as a product of diver¬ 
gences and as a compound divergence function: 

i) Product of divergences: 

( 4 ) 


( 5 ) 


In both cases, if we use the following restriction: 

\p\ > \q\ > M 


^OIMk) = 


I u[P\\q) ■ V(q\\r) ■ u\p\\r)\ 
\i>(?||p) • D(r\\q) ■ D(r\\p) 


ii) Compound divergence function: 


x c (p\\q\\r) = < 


[D[ p\\D(q\\r) 
D[ q\\D(p\\r) 
D[r\\D(p\\q) 
D [ D (q\\r)\\p 
D[ D(p\\r)\\q 

M D (p\\q)\\r 


D[p\\D(r\\q) 
D[ q\\D{r\\p) 
D[ r\\D(q\\p) 
D[ D(r\\q)\\p 
D[ D(r\\p)\\q 
D[ D{q\\p)\\r 


the definition of trivergence is, in particular, sorted by their cardinality. Then, 
we have for the product: 


\\q\\r) = D(p\\q ) • D(q\\r) ■ D(jp\\r) (6) 
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T={pKJ q U r } 



And for the compound function: 

T c (p\\q\\r) = D[p\\D(q\\r)} (7) 

In order to clarify the weight of the smoothing (equation [2]) for p w ,q w and 
r w , from Figure^ we have for each region that: 

1. {p\{g U r}}: q w = r w = 0; 

2 . {p (~1 r}\q: q w = 0 ; 

3. {p n q}\r: r w = 0; 

4. {p n q n r }: p w 7 ^ 0, q w 7 ^ 0, v w 7 ^ 0; 

5. {q\{p U r}}: p w = r w = 0; 

6 . {q D r}\p: p w = 0 ; 

7. {r\{p U g}}: p w = q w = 0. 

In this paper, we will use both Kullback-Leibler Dkl [I] and Jensen-Shannon 
D JS m divergences in order to calculate the trivergence T 7 r,c (p||g||r). 

4. Distribution using Kullback-Leibler divergence 

4-1- as product of KL divergences 

Definition. Let p, q and r be three probability distributions where 

\p\ > |g| > \r\ 

and T = JpUgUr}, with cardinality |T|. The Kullback-Leibler trivergence 
between p , q and r, sorted by their cardinality is defined as a product of diver¬ 
gences: 

T KL(p|M]r) = D KL (p\\q) ■ D KL (q\\r) ■ D KL (p\\r) 


4 




Calculating simultaneously for p , q and r: 


x£p 
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x£ P x£p\q xGpDq 

and using the smoothing from the equation @>: 


E , 
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From the equation (j9j: 
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E i Vw \ ^ i Hw \ ^ Qw 

q w log — = 2^ log — + 2^ log — 
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and using the smoothing from the equation ([2]): 


X] g™ log — = 

P W 

xeq 


r<q \T\C q 1 

Y 7T lo § smooth r w = — 

xt\r l?l kl |T| 

/^g | r | nq 

Y -py log — without smooth 

, aie{gnr} 
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From the equation (10): 


5 >k>g- = 51 piulogi ^ + p™ iog 

x£p Tw x£p\r Tw x£pDr 


and using the smoothing from the equation ©: 


Y Pw log 

xep 
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therefore: 


D K h(p\\q ) = 
D K h(q\\r) = 


E 
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4-2. os compound divergence function 

Definition Let p, q and r be three probability distributions where 


(16) 


\p\ > kl > kl 


and T = {pUgUr}, with cardinality |T|. The Kullback-Leibler trivergence be¬ 
tween p , q and r, sorted by their cardinality is defined as a compound divergence 
function: 

C / || || X n r M D Kh(q\\r) 

T KI>lklk) = A<L 


P 


kl 


We computed Dia 1 in order to consider this fraction suchs as a probability. 


kl 

Firstly, we calculate: 


D K h(q\\r) = 5>bg 


wEq 


Qw 

r w 


however J2 w e q Q w 1°§ r~ defined by equation (15), therefore using a smoothing 

in the case of unseen events: 


^KLOIklk) = 


E ^ lo s 

x^pDq 

E P* l 0 gl T k: 

< x£p\q 
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DKh(q\\r) 


if D KL (g|k) = 0; 


(17) 


5. Distribution using Jensen-Shannon divergence 

5.1. t 71 ' as product of JS divergences 

Definition. Let p, q and r be three probability distributions where 

kl > kl > kl 

and T = {p U q U r}, with cardinality |T|. The Jensen-Shannon trivergence 
between p, q and r, sorted by their cardinality is defined as a product of diver¬ 
gences: 

kisOlklk) = Aistelk) • Ais(kk) • Ais(kk) 
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We defined: 


ppq 

± w 


= Pw log 


2 Pu 


Pw H - Qw 


■,Ql q 


q w log 


2 qw 

Pw d" qw 


Qw = Qw log 2q ^' ; R% = r w log 

qw ~\~ T*w qw H" T u 

R p w = r w log 2rw ; pp r = p w log - 2pw 


r w + Pw 

Calculating simultaneously for p, q and r: 


Pw 


Ajs(p||<?) 

Djs(q\\r) 

Djs(p\\r) 


2 E i P w+Q P w q } 

w€{pUq} 
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w£{qUr} 
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(18) 
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For 2Djs(p\\q) we have: 

E { p w+Qw} = E pP9 + Qw + E ppq + Qw 

w£pUq w£p\q w£pDq 


+ E P w+Q P w q 

w€q\p 

and using the smoothing for p w and q w from the equation ([2| : 
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, 2 |T|C 7 « 

TT l0 § irpl^o , I I ; Pw = 


uGq\p 

For 2£)js(g||r) we have: 
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and using the smoothing for q w and r w from the equation 
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Finally, for 2Djg(p\\r) we have: 


E {p% r +R p w r } = E PF+RS+ E p z r + R i r 

w€pUr w€p\r wGpHr 

+ E P - r + 
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Using the smoothing for p w and r w from the equation ([2]) : 
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5.2. r c as compound divergence function 

Definition. Let p , q and r be three probability distributions where 

kl > kl > M 


T = {pUgUr}, with cardinality |T| and Qf? = {gllr}, with cardinality \QR\. 
The Jensen-Shannon trivergence sorted by their cardinality, between p , q and r 
is defined as a compound divergence function: 


kisOIMk) = D JS 


p II 


-Pjs(glk) 

\QR\ 


We computed 

First, we calculate: 


in order to consider this fraction suchs as a probability. 



E r ™ lo S 
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neverthless Djs(g||r) is defined by equation (22), therefore using a smoothing 
in the case of unseen events: 


_ -Djs(glk) _ l 
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; if Px = 0 


6. Conclusions 

The main contribution of this paper is the formalisation of the definition 
of smoothed Trivergence of Probability Distributions (TPD). The trivergence 
of three objects represented as probability distributions, was calculated using 
elementary functions of divergence (KL and JS). We have proposed two ways 
to compute the smoothed TPD. The first one uses a product of divergences and 
the second one uses a compound divergence function. Divergences measures 
hase been used in Automatic Text Summarization tasks among many 

others. 
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